[ML] Wait for autodetect to be ready in the datafeed #37349

droberts195 · 2019-01-11T09:58:48Z

This is a reinforcement of #37227. It turns out that
persistent tasks are not made stale if the node they
were running on is restarted and the master node does
not notice this. The main scenario where this happens
is when minimum master nodes is the same as the number
of nodes in the cluster, so the cluster cannot elect a
master node when any node is restarted.

When an ML node restarts we need the datafeeds for any
jobs that were running on that node to not just wait
until the jobs are allocated, but to wait for the
autodetect process of the job to start up. In the case
of reassignment of the job persistent task this was
dealt with by the stale status test. But in the case
where a node restarts but its persistent tasks are not
reassigned we need a deeper test.

Fixes #36810

This is a reinforcement of elastic#37227. It turns out that persistent tasks are not made stale if the node they were running on is restarted and the master node does not notice this. The main scenario where this happens is when minimum master nodes is the same as the number of nodes in the cluster, so the cluster cannot elect a master node when any node is restarted. When an ML node restarts we need the datafeeds for any jobs that were running on that node to not just wait until the jobs are allocated, but to wait for the autodetect process of the job to start up. In the case of reassignment of the job persistent task this was dealt with by the stale status test. But in the case where a node restarts but its persistent tasks are not reassigned we need a deeper test. Fixes elastic#36810

davidkyle

LGTM. Left one observation that isn't necessary for this change

davidkyle · 2019-01-11T11:17:13Z

...rc/main/java/org/elasticsearch/xpack/ml/job/process/autodetect/AutodetectProcessManager.java

@@ -327,7 +337,8 @@ public void forecastJob(JobTask jobTask, ForecastParams params, Consumer<Excepti
    public void writeUpdateProcessMessage(JobTask jobTask, UpdateParams updateParams, Consumer<Exception> handler) {
        AutodetectCommunicator communicator = getOpenAutodetectCommunicator(jobTask);
        if (communicator == null) {
-            String message = "Cannot process update model debug config because job [" + jobTask.getJobId() + "] is not open";
+            String message = "Cannot process update model debug config because job [" + jobTask.getJobId() +


This message probably made sense once but it doesn't anymore. I'd suggest
Cannot update the job config because job...

I'll create a new PR to change that.

This is a reinforcement of #37227. It turns out that persistent tasks are not made stale if the node they were running on is restarted and the master node does not notice this. The main scenario where this happens is when minimum master nodes is the same as the number of nodes in the cluster, so the cluster cannot elect a master node when any node is restarted. When an ML node restarts we need the datafeeds for any jobs that were running on that node to not just wait until the jobs are allocated, but to wait for the autodetect process of the job to start up. In the case of reassignment of the job persistent task this was dealt with by the stale status test. But in the case where a node restarts but its persistent tasks are not reassigned we need a deeper test. Fixes #36810

droberts195 added >bug v7.0.0 :ml Machine learning v6.6.0 v6.7.0 labels Jan 11, 2019

droberts195 requested a review from davidkyle January 11, 2019 09:58

davidkyle approved these changes Jan 11, 2019

View reviewed changes

droberts195 merged commit 1da59db into elastic:master Jan 11, 2019

droberts195 deleted the auto_com_check_in_datafeed branch January 11, 2019 13:22

droberts195 mentioned this pull request Jan 11, 2019

[ML] Update error message for process update #37363

Merged

davidkyle mentioned this pull request Jan 15, 2019

MlMigrationIT#testConfigMigration fails in 6.x #36935

Closed

droberts195 mentioned this pull request Jan 28, 2019

[ML] Consider unifying datafeed and job configuration #34231

Open

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Wait for autodetect to be ready in the datafeed #37349

[ML] Wait for autodetect to be ready in the datafeed #37349

droberts195 commented Jan 11, 2019

davidkyle left a comment

davidkyle Jan 11, 2019

droberts195 Jan 11, 2019

[ML] Wait for autodetect to be ready in the datafeed #37349

[ML] Wait for autodetect to be ready in the datafeed #37349

Conversation

droberts195 commented Jan 11, 2019

davidkyle left a comment

Choose a reason for hiding this comment

davidkyle Jan 11, 2019

Choose a reason for hiding this comment

droberts195 Jan 11, 2019

Choose a reason for hiding this comment